Skip to content

Conversation

@max-krasnyansky
Copy link
Collaborator

@max-krasnyansky max-krasnyansky commented Oct 28, 2025

Optimize dspqueue (used for sending Op requests to the NPU) by doing all response processing in-place.
This removes the need for the dedicated read threads (started internally by the dspqueue library) and essentially
eliminates all polling in that path.

(For the curious, the dspqueue CPU side sources can be found here
https://github.com/qualcomm/fastrpc/tree/main/src/dspqueue)

We can also bump the CPU backend thread counts now for the default use-cases since we still rely on the
CPU for Flash Attention and a few other Ops.

We're not going to release the buffers without flushing the session queue.
So there is no need to inc/dec the refcounts for every request.
We also don't need to include those bufs in the response.
We can use more CPU cores now that the dedicated dspqueue polling threads are not used (ie no contention).
Also enable more agressive polling for now since we still map Flash Attention (and a few other kernels) to
the CPU and those dspqueue threads were keeping the CPU cores are higher clock freqs.
@max-krasnyansky
Copy link
Collaborator Author

@l3utterfly this would be interesting for your use-case (ie APK with ggml-hexagon enabled).
Please give this a shot when you get the chance.

@max-krasnyansky max-krasnyansky requested a review from lhez October 28, 2025 15:52
@l3utterfly
Copy link
Contributor

@max-krasnyansky Thank you! I will test this out!

@github-actions github-actions bot added script Script related ggml changes relating to the ggml tensor library for machine learning labels Oct 28, 2025
Copy link
Collaborator

@lhez lhez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@lhez
Copy link
Collaborator

lhez commented Oct 29, 2025

Some server tests are failing, but should be unrelated.

@max-krasnyansky max-krasnyansky merged commit 3eb2be1 into ggml-org:master Oct 29, 2025
75 of 83 checks passed
@max-krasnyansky max-krasnyansky deleted the hexagon-dspqueue-opts branch October 29, 2025 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning script Script related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants